7 research outputs found
Science Concierge: A fast content-based recommendation system for scientific publications
Finding relevant publications is important for scientists who have to cope
with exponentially increasing numbers of scholarly material. Algorithms can
help with this task as they help for music, movie, and product recommendations.
However, we know little about the performance of these algorithms with
scholarly material. Here, we develop an algorithm, and an accompanying Python
library, that implements a recommendation system based on the content of
articles. Design principles are to adapt to new content, provide near-real time
suggestions, and be open source. We tested the library on 15K posters from the
Society of Neuroscience Conference 2015. Human curated topics are used to cross
validate parameters in the algorithm and produce a similarity metric that
maximally correlates with human judgments. We show that our algorithm
significantly outperformed suggestions based on keywords. The work presented
here promises to make the exploration of scholarly material faster and more
accurate.Comment: 12 pages, 5 figure
Finding best parameters to weigh relevant and non-relevant votes.
<p>Performance of the system as a function alpha and beta parameters for non-relevant documents that are 1 distance away (A), 2 dislike distance away (B), and 3 dislike distance away (C) in human curated topics.</p
Number of SVD components vs. performance of the algorithm to capture human curated topics.
<p>The number of LSA components vs the average human curated tree distance of suggested posters.</p
Relationship between human curated distance and topic distance induced by the keyword and Science Concierge models.
<p>Relationship between human curated distance and topic distance induced by the keyword and Science Concierge models.</p
Comparison of algorithms as they get more relevant documents from a simulated user.
<p>All term weighting schemes except keywords improve recommendations with votes.</p
Vector representation of documents.
<p>(A) Schematic of the workflow for converting abstracts into vector representations (see Algorithm 1) (B) Schematic of Rocchio Algorithm (C) Projection of SfN abstract vectors to 2-dimensional space using <i>t</i>-SNE color coded by human curated sessions from A to G.</p